CHAPTER 16 Getting Straight Talk on Straight-Line Regression 227
But the p value for the slope is very important. Assuming α = 0.05, if it’s less than
0.05, it means that the slope of the fitted straight line is statistically significantly
different from zero. This means that the X and Y variables are statistically signifi-
cantly associated with each other. A p value greater than 0.05 would indicate that
the true slope could equal zero, and there would be no conclusive evidence for a
statistically significant association between X and Y. In Figure 18-4, the p value for
the slope is 0.0127, which means that the slope is statistically significantly differ-
ent from zero. This tells you that in your model, body weight is statistically sig-
nificantly associated with SBP.
If you want to test for a significant correlation between two variables at α = 05, you
can look at the p value for the slope of the least-squares straight line. If it’s less
than 0.05, then the X and Y variables are also statistically significantly correlated.
The p value for the significance of the slope in a straight-line regression is always
exactly the same as the p value for the correlation test of whether r is statistically
significantly different from zero, as described in Chapter 15.
Wrapping up with measures
of goodness-of-fit
The last few lines of output in Figure 16-4 contain several indicators of how
well the straight line represents the data. The following sections describe this part
of the output.
The correlation coefficient
Most straight-line regression programs provide the classic Pearson r correlation
coefficient between X and Y (see Chapter 15 for details). But the program may pro-
vide you the correlation coefficient in a roundabout way by outputting r 2 rather
than r itself. In Figure 16-4, at the bottom under Multiple R-squared, the r 2 is listed
as 0.2984. If you want Pearson r, just use Microsoft Excel or a calculator to take
square root of 0.2984 to get 0.546.
The r 2 is always positive, because square of any number is always positive. But the
correlation coefficient can be positive or negative, depending on whether the fit-
ted line slopes upward or downward. If the fitted line slopes downward, make
your r value negative.
Why did the program give you r 2 instead of r in the first place? It’s because r 2 is a
useful estimate called the coefficient of determination. It tells you what percent of
the total variability in the Y variable can be explained by the fitted line.»
» An r 2 value of 1 means that the points lie exactly on the fitted line, with no
scatter at all.